covariate adjustment
- North America > United States > New York (0.04)
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Machine learning to optimize precision in the analysis of randomized trials: A journey in pre-specified, yet data-adaptive learning
Balzer, Laura B., van der Laan, Mark J., Petersen, Maya L.
Covariate adjustment is an approach to improve the precision of trial analyses by adjusting for baseline variables that are prognostic of the primary endpoint. Motivated by the SEARCH Universal HIV Test-and-Treat Trial (2013-2017), we tell our story of developing, evaluating, and implementing a machine learning-based approach for covariate adjustment. We provide the rationale for as well as the practical concerns with such an approach for estimating marginal effects. Using schematics, we illustrate our procedure: targeted machine learning estimation (TMLE) with Adaptive Pre-specification. Briefly, sample-splitting is used to data-adaptively select the combination of estimators of the outcome regression (i.e., the conditional expectation of the outcome given the trial arm and covariates) and known propensity score (i.e., the conditional probability of being randomized to the intervention given the covariates) that minimizes the cross-validated variance estimate and, thereby, maximizes empirical efficiency. We discuss our approach for evaluating finite sample performance with parametric and plasmode simulations, pre-specifying the Statistical Analysis Plan, and unblinding in real-time on video conference with our colleagues from around the world. We present the results from applying our approach in the primary, pre-specified analysis of 8 recently published trials (2022-2024). We conclude with practical recommendations and an invitation to implement our approach in the primary analysis of your next trial.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Africa > Uganda (0.06)
- Africa > Kenya (0.06)
- (8 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology > HIV (0.72)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
Deceptive uses of Artificial Intelligence in elections strengthen support for AI ban
Jungherr, Andreas, Rauchfleisch, Adrian, Wuttke, Alexander
All over the world, political parties, politicians, and campaigns explore how Artificial Intelligence (AI) can help them win elections. However, the effects of these activities are unknown. We propose a framework for assessing AI's impact on elections by considering its application in various campaigning tasks. The electoral uses of AI vary widely, carrying different levels of concern and need for regulatory oversight. To account for this diversity, we group AI-enabled campaigning uses into three categories - campaign operations, voter outreach, and deception. Using this framework, we provide the first systematic evidence from a preregistered representative survey and two preregistered experiments (n=7,635) on how Americans think about AI in elections and the effects of specific campaigning choices. We provide three significant findings. There is a misalignment of incentives for deceptive practices and their externalities. We cannot count on public opinion to provide strong enough incentives for parties to forgo tactical advantages from AI-enabled deception. There is a need for regulatory oversight and systematic outside monitoring of electoral uses of AI. Still, regulators should account for the diversity of AI uses and not completely disincentivize their electoral use. Elections are times of high public attention on campaigns and their tools of communication. A representative survey of Americans shows that people dislike all kinds of AI uses in campaigns but are more critical of deceptive uses than those improving campaign operations or voter outreach (Study 1, n = 1,199). A survey experiment shows that when learning about specific AI uses in campaigns, American respondents reacted much more negatively to deceptive uses (Study 2, n = 1,985). Our study identifies a misalignment of incentives for deceptive practices and their externalities. We cannot count on public opinion to provide strong enough incentives for parties to forgo tactical advantages from AI-enabled deception.
- Asia > Taiwan (0.04)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials
Li, Yunfan, Sabbaghi, Arman, Walsh, Jonathan R., Fisher, Charles K.
Randomized controlled trials (RCTs) with binary primary endpoints introduce novel challenges for inferring the causal effects of treatments. The most significant challenge is non-collapsibility, in which the conditional odds ratio estimand under covariate adjustment differs from the unconditional estimand in the logistic regression analysis of RCT data. This issue gives rise to apparent paradoxes, such as the variance of the estimator for the conditional odds ratio from a covariate-adjusted model being greater than the variance of the estimator from the unadjusted model. We address this challenge in the context of adjustment based on predictions of control outcomes from generative artificial intelligence (AI) algorithms, which are referred to as prognostic scores. We demonstrate that prognostic score adjustment in logistic regression increases the power of the Wald test for the conditional odds ratio under a fixed sample size, or alternatively reduces the necessary sample size to achieve a desired power, compared to the unadjusted analysis. We derive formulae for prospective calculations of the power gain and sample size reduction that can result from adjustment for the prognostic score. Furthermore, we utilize g-computation to expand the scope of prognostic score adjustment to inferences on the marginal risk difference, relative risk, and odds ratio estimands. We demonstrate the validity of our formulae via extensive simulation studies that encompass different types of logistic regression model specifications. Our simulation studies also indicate how prognostic score adjustment can reduce the variance of g-computation estimators for the marginal estimands while maintaining frequentist properties such as asymptotic unbiasedness and Type I error rate control. Our methodology can ultimately enable more definitive and conclusive analyses for RCTs with binary primary endpoints.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Greenland (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors
Vanderbeek, Alyssa M., Sabbaghi, Arman, Walsh, Jon R., Fisher, Charles K.
Effective and rapid decision-making from randomized controlled trials (RCTs) requires unbiased and precise treatment effect inferences. Two strategies to address this requirement are to adjust for covariates that are highly correlated with the outcome, and to leverage historical control information via Bayes' theorem. We propose a new Bayesian prognostic covariate adjustment methodology, referred to as Bayesian PROCOVA, that combines these two strategies. Covariate adjustment in Bayesian PROCOVA is based on generative artificial intelligence (AI) algorithms that construct a digital twin generator (DTG) for RCT participants. The DTG is trained on historical control data and yields a digital twin (DT) probability distribution for each RCT participant's outcome under the control treatment. The expectation of the DT distribution, referred to as the prognostic score, defines the covariate for adjustment. Historical control information is leveraged via an additive mixture prior with two components: an informative prior probability distribution specified based on historical control data, and a weakly informative prior distribution. The mixture weight determines the extent to which posterior inferences are drawn from the informative component, versus the weakly informative component. This weight has a prior distribution as well, and so the entire additive mixture prior is completely pre-specifiable without involving any RCT information. We establish an efficient Gibbs algorithm for sampling from the posterior distribution, and derive closed-form expressions for the posterior mean and variance of the treatment effect parameter conditional on the weight, in Bayesian PROCOVA. We evaluate efficiency gains of Bayesian PROCOVA via its bias control and variance reduction compared to frequentist PROCOVA in simulation studies that encompass different discrepancies. These gains translate to smaller RCTs.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Law (0.68)
- Government > Regional Government > North America Government > United States Government > FDA (0.47)
Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on an Online Educational Platform: New Data and New Results
Sales, Adam C., Prihar, Ethan B., Gagnon-Bartsch, Johann A., Heffernan, Neil T.
Randomized A/B tests within online learning platforms represent an exciting direction in learning sciences. With minimal assumptions, they allow causal effect estimation without confounding bias and exact statistical inference even in small samples. However, often experimental samples and/or treatment effects are small, A/B tests are underpowered, and effect estimates are overly imprecise. Recent methodological advances have shown that power and statistical precision can be substantially boosted by coupling design-based causal estimation to machine-learning models of rich log data from historical users who were not in the experiment. Estimates using these techniques remain unbiased and inference remains exact without any additional assumptions. This paper reviews those methods and applies them to a new dataset including over 250 randomized A/B comparisons conducted within ASSISTments, an online learning platform. We compare results across experiments using four novel deep-learning models of auxiliary data and show that incorporating auxiliary data into causal estimates is roughly equivalent to increasing the sample size by 20% on average, or as much as 50-80% in some cases, relative to t-tests, and by about 10% on average, or as much as 30-50%, compared to cutting-edge machine learning unbiased estimates that use only data from the experiments. We show that the gains can be even larger for estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and extend to post-stratification population effects estimators. In randomized A/B tests on an online learning platform, students are randomized between different educational conditions or strategies, and their subsequent educational outcomes of interest are compared between different conditions. For instance, Harrison et al. (2020) studied Data and code used in this work can be found at https://osf.io/k8ph9/. Prior to the students' work, the authors designed four different educational conditions, which differed in how the numbers and symbols in arithmetic expressions were spaced. As students logged on to the platform, during their usual schoolwork, they were each individually randomized to one of the four conditions, and completed their work under that condition.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Causal Bias Quantification for Continuous Treatment
Detommaso, Gianluca, Brückner, Michael, Schulz, Philip, Chernozhukov, Victor
In this work we develop a novel characterization of marginal causal effect and causal bias in the continuous treatment setting. We show they can be expressed as an expectation with respect to a conditional probability distribution, which can be estimated via standard statistical and probabilistic methods. All terms in the expectations can be computed via automatic differentiation, also for highly non-linear models. We further develop a new complete criterion for identifiability of causal effects via covariate adjustment, showing the bias equals zero if the criterion is met. We study the effectiveness of our framework in three different scenarios: linear models under confounding, overcontrol and endogenous selection bias; a non-linear model where full identifiability cannot be achieved because of missing data; a simulated medical study of statins and atherosclerotic cardiovascular disease.
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Bayesian prognostic covariate adjustment
Walsh, David, Schuler, Alejandro, Hall, Diana, Walsh, Jon, Fisher, Charles
Historical data about disease outcomes can be integrated into the analysis of clinical trials in many ways. We build on existing literature that uses prognostic scores from a predictive model to increase the efficiency of treatment effect estimates via covariate adjustment. Here we go further, utilizing a Bayesian framework that combines prognostic covariate adjustment with an empirical prior distribution learned from the predictive performances of the prognostic model on past trials. The Bayesian approach interpolates between prognostic covariate adjustment with strict type I error control when the prior is diffuse, and a single-arm trial when the prior is sharply peaked. This method is shown theoretically to offer a substantial increase in statistical power, while limiting the type I error rate under reasonable conditions. We demonstrate the utility of our method in simulations and with an analysis of a past Alzheimer's disease clinical trial.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score
Schuler, Alejandro, Walsh, David, Hall, Diana, Walsh, Jon, Fisher, Charles
Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care conditions that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects' predicted outcomes (their prognostic scores). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model. We demonstrate the approach using simulations and a reanalysis of an Alzheimer's Disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power and sample size calculations that account for the gains from the prognostic model for clinical trial design.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- (3 more...)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)